State of the art speech recognition systems use data-intensivecontext-dependent phonemes as acoustic units. However, these approaches do nottranslate well to low resourced languages where large amounts of training datais not available. For such languages, automatic discovery of acoustic units iscritical. In this paper, we demonstrate the application of nonparametricBayesian models to acoustic unit discovery. We show that the discovered unitsare correlated with phonemes and therefore are linguistically meaningful. Wealso present a spoken term detection (STD) by example query algorithm based onthese automatically learned units. We show that our proposed system produces aP@N of 61.2% and an EER of 13.95% on the TIMIT dataset. The improvement in theEER is 5% while P@N is only slightly lower than the best reported system in theliterature.
展开▼